Overview

Dataset statistics

Number of variables15
Number of observations20000
Missing cells7344
Missing cells (%)2.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 MiB
Average record size in memory120.0 B

Variable types

NUM8
CAT6
BOOL1

Warnings

NOME has a high cardinality: 19867 distinct values High cardinality
NOTA_EM is highly correlated with NOTA_DEHigh correlation
NOTA_DE is highly correlated with NOTA_EMHigh correlation
NOTA_GO is highly correlated with NOTA_MFHigh correlation
NOTA_MF is highly correlated with NOTA_GOHigh correlation
NOTA_GO has 3716 (18.6%) missing values Missing
INGLES has 3628 (18.1%) missing values Missing
NOME is uniformly distributed Uniform
NOTA_DE has 3575 (17.9%) zeros Zeros
NOTA_EM has 3584 (17.9%) zeros Zeros
NOTA_MF has 4331 (21.7%) zeros Zeros
NOTA_GO has 3537 (17.7%) zeros Zeros
H_AULA_PRES has 657 (3.3%) zeros Zeros
TAREFAS_ONLINE has 2204 (11.0%) zeros Zeros

Reproduction

Analysis started2020-10-02 00:34:37.036361
Analysis finished2020-10-02 00:34:53.751408
Duration16.72 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

MATRICULA
Real number (ℝ≥0)

Distinct19770
Distinct (%)98.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean551148.2714
Minimum100003
Maximum999995
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB

Quantile statistics

Minimum100003
5-th percentile145747.35
Q1326554.25
median550630
Q3775524.75
95-th percentile956802.5
Maximum999995
Range899992
Interquartile range (IQR)448970.5

Descriptive statistics

Standard deviation259488.7666
Coefficient of variation (CV)0.4708148062
Kurtosis-1.192952976
Mean551148.2714
Median Absolute Deviation (MAD)224464.5
Skewness0.007501116468
Sum1.102296543e+10
Variance6.733441998e+10
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7512233< 0.1%
 
6576053< 0.1%
 
3907272< 0.1%
 
2499022< 0.1%
 
8944142< 0.1%
 
5048122< 0.1%
 
7612432< 0.1%
 
8670042< 0.1%
 
1477152< 0.1%
 
6680082< 0.1%
 
Other values (19760)1997899.9%
 
ValueCountFrequency (%) 
1000031< 0.1%
 
1000041< 0.1%
 
1000221< 0.1%
 
1000581< 0.1%
 
1001181< 0.1%
 
ValueCountFrequency (%) 
9999951< 0.1%
 
9999851< 0.1%
 
9999211< 0.1%
 
9999111< 0.1%
 
9998971< 0.1%
 

NOME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct19867
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Maria da Silva
 
5
Melissa de Souza
 
4
Maria dos Santos
 
4
Kevin da Silva
 
3
Kiara de Barbosa
 
3
Other values (19862)
19981 
ValueCountFrequency (%) 
Maria da Silva5< 0.1%
 
Melissa de Souza4< 0.1%
 
Maria dos Santos4< 0.1%
 
Kevin da Silva3< 0.1%
 
Kiara de Barbosa3< 0.1%
 
Bento da Silva3< 0.1%
 
Joyce da Silva3< 0.1%
 
Simone da Silva3< 0.1%
 
Ana de Andrade3< 0.1%
 
Eunice da Silva3< 0.1%
 
Other values (19857)1996699.8%
 
Frequencies of value counts

Unique

Unique19749 ?
Unique (%)98.7%
Histogram of lengths of the category

Length

Max length57
Median length24
Mean length24.28435
Min length8

REPROVACOES_DE
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
16439 
1
2913 
3
 
648
ValueCountFrequency (%) 
01643982.2%
 
1291314.6%
 
36483.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

REPROVACOES_EM
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
16439 
1
2913 
3
 
648
ValueCountFrequency (%) 
01643982.2%
 
1291314.6%
 
36483.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

REPROVACOES_MF
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
15671 
1
3517 
3
 
812
ValueCountFrequency (%) 
01567178.4%
 
1351717.6%
 
38124.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

REPROVACOES_GO
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
15671 
1
3560 
3
 
769
ValueCountFrequency (%) 
01567178.4%
 
1356017.8%
 
37693.8%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

NOTA_DE
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct50
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.19656
Minimum0
Maximum9
Zeros3575
Zeros (%)17.9%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q15.2
median6.2
Q36.7
95-th percentile7.5
Maximum9
Range9
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation2.522544812
Coefficient of variation (CV)0.4854258994
Kurtosis0.3543124521
Mean5.19656
Median Absolute Deviation (MAD)0.7
Skewness-1.392679723
Sum103931.2
Variance6.363232328
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0357517.9%
 
6.49324.7%
 
6.59294.6%
 
6.38874.4%
 
6.68464.2%
 
6.28424.2%
 
6.78424.2%
 
6.18024.0%
 
6.87173.6%
 
6.97083.5%
 
Other values (40)892044.6%
 
ValueCountFrequency (%) 
0357517.9%
 
43< 0.1%
 
4.1200.1%
 
4.2460.2%
 
4.3470.2%
 
ValueCountFrequency (%) 
91< 0.1%
 
8.81< 0.1%
 
8.64< 0.1%
 
8.55< 0.1%
 
8.45< 0.1%
 

NOTA_EM
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct57
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.080285
Minimum0
Maximum9.4
Zeros3584
Zeros (%)17.9%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q14.9
median5.9
Q36.7
95-th percentile7.6
Maximum9.4
Range9.4
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation2.523928155
Coefficient of variation (CV)0.4968083788
Kurtosis0.1526173614
Mean5.080285
Median Absolute Deviation (MAD)0.9
Skewness-1.236398587
Sum101605.7
Variance6.370213329
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0358417.9%
 
6.26273.1%
 
6.76273.1%
 
6.86263.1%
 
5.96163.1%
 
66043.0%
 
6.35923.0%
 
6.55913.0%
 
6.95842.9%
 
6.15772.9%
 
Other values (47)1097254.9%
 
ValueCountFrequency (%) 
0358417.9%
 
3.92< 0.1%
 
4200.1%
 
4.1440.2%
 
4.2800.4%
 
ValueCountFrequency (%) 
9.41< 0.1%
 
9.31< 0.1%
 
9.22< 0.1%
 
9.11< 0.1%
 
99< 0.1%
 

NOTA_MF
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct69
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.81763
Minimum0
Maximum11.5
Zeros4331
Zeros (%)21.7%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q14.7
median5.5
Q36.5
95-th percentile8.2
Maximum11.5
Range11.5
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation2.734775335
Coefficient of variation (CV)0.567659894
Kurtosis-0.4579879814
Mean4.81763
Median Absolute Deviation (MAD)0.9
Skewness-0.8262907215
Sum96352.6
Variance7.478996133
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0433121.7%
 
56703.4%
 
5.26653.3%
 
5.16653.3%
 
5.36553.3%
 
5.66503.2%
 
5.46483.2%
 
5.56363.2%
 
5.75963.0%
 
5.95913.0%
 
Other values (59)989349.5%
 
ValueCountFrequency (%) 
0433121.7%
 
4.51520.8%
 
4.63631.8%
 
4.73901.9%
 
4.84552.3%
 
ValueCountFrequency (%) 
11.51< 0.1%
 
11.41< 0.1%
 
111< 0.1%
 
10.93< 0.1%
 
10.88< 0.1%
 

NOTA_GO
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct56
Distinct (%)0.3%
Missing3716
Missing (%)18.6%
Infinite0
Infinite (%)0.0%
Mean4.534100958
Minimum0
Maximum10
Zeros3537
Zeros (%)17.7%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q14.5
median5.4
Q36.2
95-th percentile7.2
Maximum10
Range10
Interquartile range (IQR)1.7

Descriptive statistics

Standard deviation2.509209393
Coefficient of variation (CV)0.5534083639
Kurtosis-0.4288508308
Mean4.534100958
Median Absolute Deviation (MAD)0.8
Skewness-1.02567161
Sum73833.3
Variance6.296131778
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0353717.7%
 
5.36033.0%
 
5.55762.9%
 
5.15752.9%
 
5.25742.9%
 
5.45462.7%
 
5.65422.7%
 
55352.7%
 
5.85172.6%
 
5.75082.5%
 
Other values (46)777138.9%
 
(Missing)371618.6%
 
ValueCountFrequency (%) 
0353717.7%
 
4.16< 0.1%
 
4.2700.4%
 
4.31470.7%
 
4.41951.0%
 
ValueCountFrequency (%) 
101< 0.1%
 
9.61< 0.1%
 
9.51< 0.1%
 
9.41< 0.1%
 
9.21< 0.1%
 

INGLES
Boolean

MISSING

Distinct2
Distinct (%)< 0.1%
Missing3628
Missing (%)18.1%
Memory size156.2 KiB
1
10581 
0
5791 
(Missing)
3628 
ValueCountFrequency (%) 
11058152.9%
 
0579129.0%
 
(Missing)362818.1%
 

H_AULA_PRES
Real number (ℝ≥0)

ZEROS

Distinct26
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.10295
Minimum0
Maximum25
Zeros657
Zeros (%)3.3%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median4
Q36
95-th percentile14
Maximum25
Range25
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.118421951
Coefficient of variation (CV)0.8070668831
Kurtosis3.833488855
Mean5.10295
Median Absolute Deviation (MAD)2
Skewness1.850651561
Sum102059
Variance16.96139937
MonotocityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%) 
3385519.3%
 
2372318.6%
 
5298114.9%
 
4284714.2%
 
67093.5%
 
77063.5%
 
16753.4%
 
86693.3%
 
06573.3%
 
96283.1%
 
Other values (16)255012.8%
 
ValueCountFrequency (%) 
06573.3%
 
16753.4%
 
2372318.6%
 
3385519.3%
 
4284714.2%
 
ValueCountFrequency (%) 
25400.2%
 
24430.2%
 
23400.2%
 
22290.1%
 
21330.2%
 

TAREFAS_ONLINE
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1403
Minimum0
Maximum7
Zeros2204
Zeros (%)11.0%
Memory size156.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q35
95-th percentile6
Maximum7
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.854909147
Coefficient of variation (CV)0.5906789629
Kurtosis-0.9080523168
Mean3.1403
Median Absolute Deviation (MAD)1
Skewness-0.01568274405
Sum62806
Variance3.440687944
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%) 
2514125.7%
 
5489224.5%
 
3262713.1%
 
4238611.9%
 
0220411.0%
 
112856.4%
 
69014.5%
 
75642.8%
 
ValueCountFrequency (%) 
0220411.0%
 
112856.4%
 
2514125.7%
 
3262713.1%
 
4238611.9%
 
ValueCountFrequency (%) 
75642.8%
 
69014.5%
 
5489224.5%
 
4238611.9%
 
3262713.1%
 

FALTAS
Real number (ℝ≥0)

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.0606
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Memory size156.2 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q36
95-th percentile7
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.674714266
Coefficient of variation (CV)0.4124302483
Kurtosis-0.5736840042
Mean4.0606
Median Absolute Deviation (MAD)1
Skewness0.3731042662
Sum81212
Variance2.804667873
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%) 
3704335.2%
 
6357917.9%
 
4286614.3%
 
5245412.3%
 
216708.3%
 
19634.8%
 
78284.1%
 
85973.0%
 
ValueCountFrequency (%) 
19634.8%
 
216708.3%
 
3704335.2%
 
4286614.3%
 
5245412.3%
 
ValueCountFrequency (%) 
85973.0%
 
78284.1%
 
6357917.9%
 
5245412.3%
 
4286614.3%
 

PERFIL
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
EXATAS
8230 
DIFICULDADE
7001 
HUMANAS
3196 
MUITO_BOM
902 
EXCELENTE
 
671
ValueCountFrequency (%) 
EXATAS823041.1%
 
DIFICULDADE700135.0%
 
HUMANAS319616.0%
 
MUITO_BOM9024.5%
 
EXCELENTE6713.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length11
Median length7
Mean length8.146
Min length6

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

MATRICULANOMEREPROVACOES_DEREPROVACOES_EMREPROVACOES_MFREPROVACOES_GONOTA_DENOTA_EMNOTA_MFNOTA_GOINGLESH_AULA_PRESTAREFAS_ONLINEFALTASPERFIL
0502375Márcia Illiglener00006.25.84.65.90.0243EXATAS
1397093Jason Jytereoman Izoimum00006.06.25.24.51.0243EXATAS
2915288Bartolomeu Inácio da Gama00007.36.77.17.20.0503HUMANAS
3192652Fernanda Guedes13110.00.00.00.01.0444DIFICULDADE
4949491Alessandre Borba Gomes13110.00.00.00.01.0525DIFICULDADE
5627360Magali Hellen Gejibaflião00007.37.47.66.51.0535HUMANAS
6804493Tiago Brisu Pires00005.86.07.35.11.0526DIFICULDADE
7433789Andressa Gabrielle da Silva00004.95.05.94.6NaN226DIFICULDADE
8178335Gilmar Oséas Etonvic00004.44.84.74.61.0344DIFICULDADE
9987229Otávia Mônica Noopu00006.45.45.05.51.0353EXATAS

Last rows

MATRICULANOMEREPROVACOES_DEREPROVACOES_EMREPROVACOES_MFREPROVACOES_GONOTA_DENOTA_EMNOTA_MFNOTA_GOINGLESH_AULA_PRESTAREFAS_ONLINEFALTASPERFIL
19990170643Celso Eric da Lira11110.00.00.00.0NaN245DIFICULDADE
19991259981Maria Charlene Dadamu00005.55.25.54.31.0451EXATAS
19992991838Lucas de Drummond Gamdoz00006.35.65.75.41.0223EXATAS
19993489491Zenaide Dace de Britto00316.46.10.00.01.0726DIFICULDADE
19994876548Nair Jeniffer da Silva Dias11110.00.00.00.01.02044DIFICULDADE
19995856673Laércio Mário da Silva00007.07.95.87.01.0956EXATAS
19996576100Cibele Quésia Poza11110.00.00.0NaN1.0325DIFICULDADE
19997888739Marcielle Chale Bape00007.97.68.37.2NaN831EXCELENTE
19998722743Suzanne Mirian Mourão00116.35.10.00.01.0326DIFICULDADE
19999417268Maria Isaiane da Silva Luwequisman00117.07.30.00.0NaN306DIFICULDADE